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ABSTRACT 

We  present  an  energy-efficient,  utility  accrnal,  real-time 
schednling  algorithm  called  the  Resource- constrained  Energy- 
Efficient  Utility  Accrual  Algorithm  (or  ReUA).  RelJA  con¬ 
siders  an  application  model  where  activities  are  subject 
to  time/utility  function  (TUF)  time  constraints,  resource 
dependencies  inclnding  mutual  exclusion  constraints,  and 
statistical  performance  requirements  including  activity  (time¬ 
liness)  utility  bounds  that  are  probabilistically  satisfied. 
Further,  ReUA  targets  mobile  embedded  systems  where 
system-level  energy  consumption  is  also  a  major  concern. 

For  such  a  model,  we  consider  the  scheduling  objectives 
of  (1)  satisfying  the  statistical  performance  requirements; 
and  (2)  maximizing  the  system-level  energy  efficiency.  At 
the  same  time,  resource  dependencies  must  be  respected. 
Since  the  problem  is  A/'P-hard,  ReUA  makes  resource 
allocations  using  statistical  properties  of  application  cy¬ 
cle  demands  and  heuristically  computes  schedules  with  a 
polynomial-time  cost.  We  analytically  establish  several 
timeliness  and  non-timeliness  properties  of  the  algorithm. 
Further,  our  simulation  experiments  illustrate  the  algo¬ 
rithm’s  effectiveness. 

1.  INTRODUCTION 

Energy  consumption  has  become  one  of  the  primary 
concerns  in  electronic  system  design  due  to  the  recent 
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popularity  of  portable  devices  and  the  environmental  con¬ 
cerns  related  to  desktops  and  servers.  For  mobile  and 
portable  embedded  systems,  minimizing  energy  consump¬ 
tion  results  in  longer  battery  life.  But  intelligent  devices 
usually  need  powerful  processors,  which  consume  more  en¬ 
ergy  than  those  in  simpler  devices,  thus  reducing  battery 
life.  This  fundamental  tradeoff  between  performance  and 
battery  life  is  critically  important  and  has  been  addressed 
in  the  past  [16, 29]. 

Saving  energy  without  substantially  affecting  applica¬ 
tion  performance  is  crucial  for  embedded  real-time  sys¬ 
tems  that  are  mobile  and  battery-powered,  because  most 
real-time  applications  running  on  energy-limited  systems 
inherently  impose  temporal  constraints  on  the  sojourn 
time  [5]. 

Dynamic  voltage  scaling  (DVS)  is  a  common  mecha¬ 
nism  studied  in  the  past  to  save  CPU  energy  [5,  12, 14, 
15,25,30,31,37].  DVS  addresses  the  trade-off  between 
performance  and  battery  life  by  taking  into  account  two 
important  characteristics  of  most  current  computer  sys¬ 
tems:  (1)  For  CMOS-based  processors,  the  maximum 
clock  frequency  scales  almost  linearly  with  the  power  sup¬ 
ply  voltage,  and  the  energy  consumed  per  cycle  is  propor¬ 
tional  to  the  square  of  the  voltage  [7];  and  (2)  the  peak 
computing  rate  needed  is  much  higher  than  the  average 
throughput  that  must  be  sustained.  A  lower  frequency 
(i.e.,  speed)  hence  enables  a  lower  voltage  and  yields  a 
quadratic  energy  reduction,  at  the  expense  of  roughly  lin- 
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Figure  1:  Example  Time  Constraints  Specified  Using  Time/Utility  Functions 
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early  increased  sojourn  time  [13]. 

Most  of  the  past  efforts  on  energy-efficient  real-time 
scheduling  focus  on  the  deadline  time  constraint  and  deadline- 
based  timeliness  optimality  criteria  such  as  meeting  all  or 
some  percentage  of  deadlines  [5,13,31,36].  Further,  past 
efforts  focus  on  resource-independent  activities  i.e.,  activ¬ 
ities  that  do  not  access  shared  resources,  which  are  sub¬ 
ject  to  mutual  exclusion  constraints.  For  the  optimality 
criterion  of  meeting  all  deadlines,  past  DVS  schemes  fo¬ 
cus  on  minimizing  energy  consumption  of  the  CPU,  while 
meeting  the  deadlines  of  all  (independent)  activities. 

1.1  Soft  Timeliness  Optimality 

In  this  paper,  we  focus  on  dynamic,  adaptive,  embed¬ 
ded  real-time  control  systems  at  any  level  (s)  of  an  enterprize — 
e.g.,  devices  in  the  defense  domain  such  as  multi-mode 
phased  array  radars  ]2]  and  battle  management  ]1].  Such 
embedded  systems  include  “soft”  time  constraints  (besides 
hard)  in  the  sense  that  completing  an  activity  at  any  time 
will  result  in  some  (positive  or  negative)  utility  to  the  sys¬ 
tem,  and  that  utility  depends  on  the  activity’s  completion 
time.  Moreover,  they  often  desire  a  soft  timeliness  opti¬ 
mality  criterion  such  as  completing  all  time-constrained 
activities  as  close  as  possible  to  their  optimal  completion 
times — so  as  to  yield  maximal  collective  utility — is  the 
objective. 

Jensen’s  time/utility  functions  ]20]  (or  TUFs)  allow  the 
semantics  of  soft  time  constraints  to  be  precisely  speci¬ 
fied.  A  TUF,  which  is  a  generalization  of  the  deadline 
constraint,  specifies  the  utility  to  the  system  resulting 
from  the  completion  of  an  activity  as  a  function  of  its 


completion  time.  Figure  1  shows  examples  of  time  con¬ 
straints  specified  using  TUFs.  Figures  1(a),  1(b),  and  1(c) 
show  time  constraints  of  two  large-scale,  dynamic,  em¬ 
bedded  real-time  applications  specified  using  TUFs.  The 
applications  include:  (1)  the  AWACS  (Airborne  WArning 
and  Control  System)  surveillance  mode  tracker  system  [9] 
built  by  The  MITRE  Corporation  and  The  Open  Group 
(TOG);  and  (2)  a  coastal  air  defense  system  [28]  built  by 
General  Dynamics  (GD)  and  Garnegie  Mellon  University 
(GMU). 

Figure  1(a)  shows  the  TUF  of  the  track  association  ac¬ 
tivity  of  the  AWACS;  Figures  1(b)  and  1(c)  show  TUFs 
of  three  activities  of  the  coastal  air  defense  system  called 
plot  correlation,  track  maintenance,  and  missile  control. 
Note  that  Figure  1(c)  shows  how  the  TUF  of  the  missile 
control  activity  dynamically  changes  as  the  guided  inter¬ 
ceptor  missile  approaches  its  target. 

The  classical  deadline  constraint  is  a  binary- valued  down¬ 
ward  “step”  shaped  TUF.  This  is  shown  in  Figure  1(d). 

When  timing  constraints  are  expressed  with  TUFs,  the 
scheduling  optimality  criteria  are  based  on  maximizing  ac¬ 
crued  utility  from  those  activities — e.g.,  maximizing  the 
sum,  or  the  expected  sum,  of  the  activities’  attained  utili¬ 
ties.  Such  criteria  are  called  Utility  Accrual  (or  UA)  crite¬ 
ria,  and  sequencing  (scheduling,  dispatching)  algorithms 
that  consider  UA  criteria  are  called  UA  sequencing  algo¬ 
rithms.  In  general,  other  factors  may  also  be  included  in 
the  optimality  criteria,  such  as  resource  dependencies  and 
precedence  constraints.  Several  UA  scheduling  algorithms 
are  presented  in  the  literature  [8,10,21,22,24,35]. 

1.2  System-Level  Energy  Consumption 
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Most  of  the  past  work  on  energy-efficient  real-time  schedul¬ 
ing  using  DVS  only  considers  the  energy  consumed  by  the 
CPU.  However,  the  battery  life  of  a  system  is  determined 
by  the  system’s  energy  consumption,  not  just  the  energy 
consumption  of  the  CPU.  Therefore,  energy  consumption 
models  used  in  past  efforts  are  not  accurate  for  prolonging 
the  battery  life. 

Based  on  the  experimental  observations  that  some  com¬ 
ponents  in  computer  systems  consume  constant  energy 
and  some  consume  energy  only  scalable  to  frequency  (i.e., 
voltage),  Martin  proposed  a  system-level  energy  consump¬ 
tion  model  in  [26,27].  In  this  model,  the  system-level  en¬ 
ergy  consumption  per  cycle  does  not  scale  quadratically 
to  the  CPU  frequency.  Instead,  a  polynomial  is  used  to 
represent  the  relation.  We  further  elaborate  on  this  en¬ 
ergy  model  in  Section  2.6. 

1.3  Contributions  and  Paper  Outline 

As  mentioned  previously,  almost  all  of  the  past  efforts 
on  energy-efficient  real-time  scheduling  consider  deadline- 
based  timeliness  optimality  criteria.^  Further,  to  the  best 
of  our  knowledge,  no  past  effort  (on  energy-efficient  real¬ 
time  scheduling)  considers  activities  that  share  resources, 
which  are  subject  to  mutual  exclusion  constraints.  Re¬ 
source  dependencies  are  important  for  very  many  em¬ 
bedded  systems,  as  many  such  systems  use  shared  re¬ 
sources  and  simultaneously  access  them  for  application 
progress  [19]. 

UA  scheduling  under  resource  dependencies  have  been 
studied  in  the  past  [10,22] .  But  energy-efficient  UA  schedul¬ 
ing  has  not  been  studied.  Further,  all  past  UA  schedul¬ 
ing  algorithms  maximize  the  collective  utility  attained  by 
all  activities.  They  provide  no  assurance  on  individual 
timeliness  behavior  such  as  a  lower  bound  on  individual 
activity  utility  that  is  probabilistically  satisfied. 

As  mentioned  previously,  most  of  the  past  efforts  on 

^The  only  exception  is  the  PA-BTA  algorithm  [38].  How¬ 
ever,  PA-BTA  is  restricted  to  independent  activities. 


energy-efficient  real-time  scheduling  only  consider  the  CPU’s 
energy  consumption  and  do  not  consider  the  system’s  en¬ 
ergy  consumption.  The  PA-BTA  algorithm  [38]  considers 
system-level  energy  consumption,  but  it  is  restricted  to 
independent  activities  and  provides  no  assurances  —  in¬ 
dividual  or  collective  —  on  timeliness  behavior. 

In  this  paper,  we  consider  the  problem  that  intersects: 

(1)  UA  scheduling  under  TUF  time  constraints,  providing 
assurances  on  timeliness  behavior;  (2)  activity  scheduling 
respecting  resource  dependencies;  and  (3)  CPU  schedul¬ 
ing  for  reduced  system-level  energy  consumption. 

We  consider  application  activities  that  are  subject  to 
TUF  time  constraints,  resource  dependencies  including 
mutual  exclusion  constraints,  and  statistical  performance 
requirements  including  lower  bounds  on  individual  activ¬ 
ity  utilities  that  are  probabilistically  satisfied.  Further, 
we  consider  a  system-level  energy  consumption  model. 

We  integrate  run-time-based  DVS  [14,  25,  31]  with  UA 
scheduling  using  a  single  system-level  performance  metric 
called  Utility  and  Energy  Ratio  (or  UER).  UER  facilitates 
optimization  of  timeliness  objectives  and  energy  efficiency 
in  a  unified  way. 

Given  the  metric  of  UER,  our  scheduling  objective  is 
two-fold:  (1)  satisfy  the  lower  bounds  on  individual  ac¬ 
tivity  utilities;  and  (2)  maximize  the  system’s  UER.  This 
problem  has  not  been  studied  in  the  past  and  is  AfP-hard. 

We  present  a  polynomial-time,  heuristic  algorithm  for 
this  problem  called  the  Resource-constrained  Energy-Ejficient 
Utility  Accrual  Algorithm  (or  ReUA).  We  analytically  es¬ 
tablish  several  timeliness  and  non-timeliness  properties 
of  the  algorithm  including  timeliness  optimality  during 
under-loads,  sufficiency  on  probabilistic  satisfaction  of  time¬ 
liness  lower  bounds,  deadlock-freedom,  and  correctness. 

We  also  evaluate  ReUA’s  performance  through  simula¬ 
tion.  Our  simulation  studies  reveal  that  ReUA  provides 
statistical  performance  guarantees  (as  opposed  to  worst 
case)  on  activity  timeliness  behavior.  Further,  the  algo¬ 
rithm  improves  system-level  energy-efficiency. 

Thus,  the  contribution  of  the  paper  is  the  ReUA  algo- 
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rithm.  To  the  best  of  our  knowledge,  we  are  not  aware  of 
any  other  efforts  that  solve  the  problem  solved  by  RelJA. 

The  rest  of  the  paper  is  organized  as  follows.  In  Sec¬ 
tion  2,  we  outline  our  activity,  resource,  and  timeliness 
models,  and  state  the  UA  scheduling  criterion.  We  present 
ReUA  in  Section  3.  In  Section  4,  we  establish  the  al¬ 
gorithm’s  timeliness  and  non-timeliness  properties.  Sec¬ 
tion  5  discusses  the  simulation  studies.  Finally,  we  con¬ 
clude  the  paper  in  Section  6. 

2.  MODELS  AND  OBJECTIVES 

2.1  Tasks  and  Jobs 

We  consider  the  application  to  consists  of  a  set  of  tasks, 
denoted  as  T  =  {ri,r2,---  ,T„}.  Each  task  Ti  has  a 
number  of  instances,  and  these  instances  may  be  released 
either  periodically  or  sporadically  with  a  known  minimal 
inter-arrival  time.  The  period  or  minimal  inter-arrival 
time  of  a  task  T;  is  denoted  as  Pi. 

An  instance  of  a  task  is  called  a  job,  and  we  refer  to 
the  job  of  task  p,  which  is  also  the  invocation  of 
Ti,  as  Jij.  The  basic  scheduling  entity  that  we  consider 
is  the  job  abstraction.  Thus,  we  use  J  to  denote  a  job 
without  being  task  specific,  as  seen  by  the  scheduler  at 
any  scheduling  event;  Jk  can  be  used  to  represent  a  job 
in  the  job  scheduling  queue.  Jobs  can  be  preempted  at 
arbitrary  times. 

2.2  Resource  Model 

Jobs  can  access  non-CPU  resources,  which  in  general, 
are  serially  reusable.  Examples  include  physical  resources 
(e.g.,  disks)  and  logical  resources  (e.g.,  critical  sections 
guarded  by  mutexes). 

Similar  to  fixed-priority  resource  access  protocols  (e.g., 
priority  inheritance,  priority  ceiling)  [32]  and  that  for 
UA  algorithms  [10,22],  we  consider  a  single-unit  resource 
model.  Thus,  only  a  single  instance  of  a  resource  is  present 
and  a  job  must  explicitly  specify  the  resource  that  it  wants 
to  access. 


Resources  can  be  shared  and  can  be  subject  to  mutual 
exclusion  constraints.  Thus,  only  a  single  job  can  be  ac¬ 
cessing  such  resources  at  any  given  time. 

A  job  may  request  multiple  shared  resources  during 
its  lifetime.  The  requested  time  intervals  for  holding  re¬ 
sources  may  be  nested,  overlapped  or  disjoint.  We  assume 
that  a  job  explicitly  releases  all  granted  resources  before 
the  end  of  its  execution. 

Jobs  of  different  tasks  can  have  precedence  constraints. 
For  example,  a  job  Jk  can  become  eligible  for  execution 
only  after  a  job  J;  has  completed,  because  Jk  may  require 
Ji’s  results.  As  in  [10,22],  we  allow  such  precedences  to 
be  programmed  as  resource  dependencies. 

2.3  Timeliness  Model 

A  job’s  time  constraint  is  specified  using  a  TUF.  Fol¬ 
lowing  [18],  a  time  constraint  usually  has  a  “scope” —  a 
segment  of  the  job  control  flow  that  is  associated  with  a 
time  constraint.  We  call  such  a  scope  a  “scheduling  seg¬ 
ment.” 

Different  jobs  of  a  task  have  the  same  TUF.  Thus,  we 
use  Ui  (.)  to  denote  task  TJs  TUF.  The  TUF  of  task  TJs 
jth  job  is  denoted  as  Uij  (.),  which  has  the  same  shape  as 
Ui  (.).  Without  being  task  specific,  we  use  Uj^  to  denote 
the  TUF  of  a  job  Jk;  thus  completion  of  the  job  Jk  at  a 
time  t  will  yield  a  utility  Uj^,  (t). 

TUFs  can  be  classified  into  unimodal  and  multimodal 
functions.  Unimodal  TUFs  are  those  for  which  any  de¬ 
crease  in  utility  cannot  be  followed  by  an  increase.  Ex¬ 
amples  are  shown  in  Figure  1.  TUFs,  which  are  not  uni¬ 
modal  are  multimodal.  In  this  paper,  we  restrict  our  focus 
to  non-increasing,  unimodal  TUFs  i.e.,  those  unimodal 
TUFs  for  which  utility  never  increases  as  time  advances. 
Figures  1(a),  1(b),  and  1(d)  show  examples.  Later,  we 
justify  this  restriction  in  Section  2.4. 

Each  TUF  Ui,j,i  €  {I,--  -  ,n}  has  an  initial  time  Ii,j 
and  a  termination  time  Xi^j.  Initial  and  termination 
times  are  the  earliest  and  the  latest  times  for  which  the 
TUF  is  defined,  respectively.  We  assume  that  lij  is  equal 
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to  the  arrival  time  of  job  Jij  ,  and  Xij  —  lij  is  equal  to 
the  period  or  minimal  inter-arrival  time  Pi  of  the  task  Ti. 

If  Jij’s  Xij  is  reached  and  execntion  of  the  correspond¬ 
ing  job  has  not  been  completed,  an  exception  is  raised. 
Normally,  this  exception  will  cause  Jij’s  abortion  and 
execution  of  exception  handlers. 

2.4  Statistical  Timeliness  Performance  Re¬ 
quirement 

Each  task  needs  to  accrue  some  percentage  of  its  maxi¬ 
mum  possible  utility.  The  statistical  performance  require¬ 
ment  of  a  task  Ti  is  denoted  as  {vi,pi},  which  implies 
that  task  T  should  accrue  at  least  Vi  percentage  of  its 
maximum  possible  utility  with  the  probability  pi.  This  is 
also  the  requirement  for  each  job  of  the  task  Ti.  Thus,  for 
example,  if  {vi,  pi}  =  {0.7,0.93},  then  the  task  T  needs 
to  accrue  at  least  70%  of  the  maximum  possible  utility 
with  a  probability  no  less  than  93%.  For  step  TUFs,  v 
can  only  take  the  value  0  or  1. 

This  statistical  performance  requirement  on  the  utility 
of  a  task  implies  a  corresponding  requirement  on  the  range 
of  task  sojourn  times.  For  non-increasing  unimodal  TUFs, 
this  range  is  decided  only  by  an  upper  bound,  while  for 
increasing  unimodal  TUFs,  both  a  lower  bound  and  an 
upper  bound  are  needed.  In  this  paper,  we  care  about  the 
upper  bound.  For  this  reason,  we  focus  on  non-increasing 
TUFs. 

2.5  Task  Cycle  Demands 

UA  scheduling  and  DVS  are  both  dependent  on  the 
prediction  of  task  cycle  demands.  We  estimate  the  statis¬ 
tical  properties  (e.g.,  distribution,  mean,  variance)  of  the 
demand  rather  than  the  worst-case  demand  for  three  rea¬ 
sons:  (1)  Many  embedded  real-time  applications  exhibit 
a  large  variation  in  their  actual  workload  [9].  Thus,  the 
statistical  estimation  of  the  demand  is  much  more  sta¬ 
ble  and  hence  more  predictable  than  that  of  the  actual 
workload;  (2)  worst-case  workload  information  is  usually 
a  very  conservative  prediction  of  the  actual  workload  [5] . 


Such  conservatism  usually  results  in  resource  over-supply, 
which  exacerbates  the  power  consumption  problem;  and 
(3)  allocating  cycles  based  on  the  statistical  estimation  of 
tasks’s  demands  can  provide  statistical  performance  guar¬ 
antees.  This  is  sufficient  for  the  applications  of  interest 
to  us.  In  fact,  stronger  guarantees  are  generally  infeasible 
for  dynamic,  embedded  real-time  systems. 

Let  Yi  be  the  random  variable  of  a  task  Tfs  cycle  de¬ 
mand.  Estimating  the  demand  distribution  of  the  task 
involves  two  steps:  (1)  profiling  its  cycle  usage  and  (2)  de¬ 
riving  the  probability  distribution  of  the  usage.  Recently, 
a  number  of  measurement-based  profiling  mechanisms  have 
been  proposed  [4,33,39].  Profiling  can  be  performed  on¬ 
line  or  off-line.  Off-line  profiling  provides  more  accurate 
estimation  with  the  whole  trace  of  CPU  usage,  but  it  is 
not  applicable  to  “live”  applications. 

We  assume  that  the  mean  and  variance  of  task  cycle 
demands  are  finite  and  determined  through  either  online 
or  off-line  profiling.  We  denote  the  expected  workload 
of  a  task  Ti  in  variable  voltage/speed  settings,  i.e.,  the 
expected  number  of  processor  cycles  required  by  a  task  T 
as  E{Yi),  and  the  variance  on  the  workload  as  Var(Yi). 
Note  that,  under  a  constant  speed  I.e.,  frequency  /  (given 
in  cycles  per  second),  the  expected  execution  time  of  a 
task  Ti  is  given  by  ei  =  • 

2.6  Energy  Consumption  Model 

We  consider  Martin’s  system  level  energy  consumption 
model  that  was  derived  from  experimental  observations 
that  some  components  of  a  computer  consume  constant 
power,  while  others  consume  power  that  is  scalable  to  ei¬ 
ther  voltage  or  frequency  [26,27,36].  We  use  this  model 
to  derive  the  energy  consumption  per  cycle.  This  is  sum¬ 
marized  as  follows: 

The  CPU  Is  assumed  to  be  capable  of  executing  tasks 
at  m  clock  frequencies.  When  the  CPU  operates  at  a 
frequency  /,  the  CPU’s  dynamic  power  consumption,  de¬ 
noted  as  Pd,  is  given  by  Pd  =  Cef  x  Vdd  x  /,  where  Ce/ 
is  the  effective  switch  capacitance  and  Vdd  is  the  supply 
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voltage.  On  the  other  hand,  the  clock  frequency  is  al¬ 
most  linearly  related  to  the  supply  voltage,  since  /  = 
k  X  ,  where  k  is  constant  and  Vt  is  the  threshold 

voltage  [40].  By  approximation,  f  =  a  x  Vdd,  where  a  is 
constant.  Thus,  Pd  =  x  /®,  which  is  equivalent  to 
Pd  =  S3X  /®,  where  S3  is  constant.  In  this  case,  both  the 
supply  voltage  and  the  clock  frequency  can  be  scaled. 

Besides  the  CPU,  there  are  also  other  system  compo¬ 
nents  that  consume  energy.  Given  the  dynamic  power 
consumption  equation  Pd  =  Cef  x  V^d  x  /,  power  con¬ 
sumption  equations  for  all  other  system  components  can 
be  derived.  Some  components  in  the  system  must  oper¬ 
ate  at  a  fixed  voltage  and  thus  their  power  can  only  scale 
with  frequency.  Examples  include  main  memory.  In  this 
case,  Ge/  x  V^d  can  be  represented  as  another  constant 
such  as  Si,  and  the  equation  becomes  Pd  =  Si  x  f.  Other 
components  in  the  system  consume  constant  power  with 
respect  to  the  CPU  clock  frequency.  Examples  include 
display  devices.  Thus,  their  power  consumption  can  be 
represented  as  So,  where  So  is  constant. 

Finally,  for  completeness  in  fitting  the  measured  power 
of  a  system  to  the  cubic  equation,  another  term  is  included 
to  represent  the  quadratic  term  i.e.,  Pd  =  S2  x  U^j.  Since 
/  is  almost  linearly  related  to  Vdd,  Pd  is  represented  as 
Pd  =  S2  X  .  While  this  term  does  not  represent  the 
dynamic  power  consumption  of  CMOS,  because  it  im¬ 
plies  that  Vdd  is  being  lowered  without  also  lowering  /, 
in  practice,  this  term  may  appear  because  of  variations 
in  DC-DC  regulator  efficiency  across  the  range  of  output 
power,  CMOS  leakage  currents,  and  other  second  order 
effects  [26]. 

Summing  the  power  consumption  of  all  system  com¬ 
ponents  together,  a  single  equation  for  the  system-level 
power  consumption  can  be  obtained  as;  P  —  S3  x  + 
S2  X  +  Si  X  f  +  So,  where  /  is  the  CPU  clock  frequency 
and  So,  Si,  S2,  and  S3  are  system  parameters.  The  cor¬ 
responding  energy  consumption  of  a  task  T;  is  given  by: 
Ei  —  P  X  a,  where  ei  denotes  TFs  expected  execution 
time.  Therefore,  the  expected  energy  consumption  per 


cycle  is  given  by: 

EU)  =  S3  X  /"  +  S2  X  /  +  Si  +  ^  (1) 

2.7  Scheduling  Criterion 

Given  the  models  previously  described,  we  consider  the 
UER  metric  to  integrate  timeliness  performance  and  en¬ 
ergy  consumption.  The  UER  of  a  job  measures  the  amount 
of  utility  that  can  be  accrued  per  unit  energy  consump¬ 
tion  by  executing  the  job  and  the  job(s)  that  it  depends 
upon  (due  to  resource  dependencies).  A  job  also  has  a  Lo¬ 
cal  UER  (LoUER),  which  is  defined  as  the  UER  that  the 
job  can  potentially  accrue  by  itself  at  the  current  time,  if 
it  were  to  continue  its  execution. 

We  define  the  system-level  UER  as  the  ratio  of  the  total 
accrued  utilities  and  total  consumed  energy  of  the  system 
i.e.,  UER= 

Thus,  the  ReUA  algorithm  that  we  present  considers 
a  two- fold  scheduling  criterion:  (1)  assure  that  each  task 
Ti  accrues  the  specified  percentage  Vi  of  its  maximum 
possible  utility  with  at  least  the  specified  probability  pi’, 
and  (2)  maximize  the  system-level  UER,  which  implies 
the  system’s  “energy  efficiency” 

This  problem  is  A/'T’-hard  because  it  subsumes  the  prob¬ 
lem  of  scheduling  dependent  tasks  with  step-shaped  TUFs, 
which  has  been  shown  to  be  A/'T’-hard  in  [10].  Further,  it 
has  not  been  previously  studied. 

3.  THE  REUA  ALGORITHM 
3.1  Determining  Task  Critical  Time 

To  assure  that  tasks  accrue  their  desired  utility  per¬ 
centage  and  maximize  the  energy  efficiency,  ReUA  needs 
to  provide  predictable  CPU  scheduling  and  speed  scaling. 

Let  Sij  be  the  sojourn  time  of  the  jth  job  of  task  p. 
Then,  we  have  Pr  {Ui{sij)  >  Vi  x  >  pi.  By 

the  assumption  of  non- increasing  TUFs,  it  is  sufficient 
to  have  Pr{sij  <  Di)  >  pi,  where  Di  is  the  upper 
bound  on  the  sojourn  time  of  task  p.  Di  is  calculated 
as  Di  =  U~^{ui  X  where  U~^{x)  denotes  the  in- 
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verse  function  of  TUF  Ui  (•).  If  there  are  more  than  one 
points  on  the  time  axis  that  correspond  to  Vi  x  UP°'^ , 
we  choose  the  latest  point.  By  doing  so,  we  can  poten¬ 
tially  reduce  the  CPU  bandwidth  demand  of  a  task.  We 
call  Di  “critical  time”  hereafter.  Thus,  Ti  is  probabilisti¬ 
cally  guaranteed  to  accrue  at  least  the  utility  percentage 
i^i  =  Ui{Di) ,  with  probability  pi. 

Note  that  the  period  or  minimum  inter-arrival  time  Pi 
and  critical  time  Di  of  the  task  Ti  have  the  following 
relations:  (1)  Pi  =  Di  for  a  binary- valued,  downward  step 
TUF;  and  (2)  Pi  >  Di,  for  other  non-increasing  TUFs. 

3.2  Statistical  Estimation  of  Demand 

ReUA’s  next  step  is  to  decide  the  number  of  cycles  that 
must  be  allocated  to  each  task.  To  provide  statistical 
timeliness  guarantees  while  maximizing  energy  efficiency, 
ReUA  allocates  cycles  based  on  the  statistical  require¬ 
ments  and  demand  of  each  task.  Knowing  the  mean  and 
variance  of  task  T’s  demand  Ti,  by  a  one-tailed  version 
of  the  Chebyshev’s  inequality,  when  y  >  E{Yi),  we  have: 


Pr[Yi  <y]> 


i.y-E{Yi)f 


(2) 


Var{Yi)  Y{y-  E{Yi)Y 
From  a  probabilistic  point  of  view.  Equation  2  is  the 
direct  result  of  the  cumulative  distribution  function  of  the 
task  Ti’s  cycle  demands  i.e.,  Fi(y)  =  Pr\Yi  <  y\.  Now, 
let  pi  be  the  statistical  performance  requirement  of  T  i.e., 
each  job  Jgj  of  task  T  must  accrue  Vi  percentage  of  utility 
with  a  probability  pi.  To  satisfy  this  requirement,  we 
assume  pi  =  var{Yp,HcJ-E(Yi))'^  obtain  Ci  =  E(Yi)  + 

j  Pi  y.V  ar{Yi ) 

V  ■ 

Thus,  the  scheduler  allocates  Ci  cycles  to  each  job  Jij  , 
so  that  the  probability  that  job  Ji  j  requires  no  more  than 
the  allocated  Ci  cycles  is  at  least  pi  i.e.,  Pr\Yi  <  Ci]  >  pi. 


3.3  UA  Scheduling  with  DVS 

The  parameter  Ci  determines  how  long  (in  number  of 
cycles)  to  execute  each  task.  We  now  discuss  the  other 
scheduling  dimensions — how  fast  (i.e.,  CPU  speed  scal¬ 


ing)  and  when  to  execute  each  task. 

The  intuitive  idea  is  to  assign  a  uniform  speed  to  ex¬ 
ecute  all  tasks  until  the  task  set  changes.  Assume  that 
there  are  n  tasks  and  each  task  is  allocated  Ci  cycles 
within  its  Di.  The  aggregate  CPU  demand  of  the  concur¬ 
rent  tasks  is: 


A 


(3) 


cycles  per  second  (MHz).  To  meet  this  aggregate  demand, 
the  CPU  only  needs  to  run  at  speed  Equation  3 

thus  gives  the  static,  optimal  CPU  speed  to  minimize  the 
total  energy  while  meeting  all  the  Di  under  the  tradi¬ 
tional  energy  consumption  model,  assuming  that  each  task 
presents  its  worst-case  workload  to  the  processor  at  every 
instance  [5]. 

However,  the  cycle  demands  of  tasks  often  vary  greatly. 
In  particular,  a  task  may,  and  often  does,  complete  a  job 
before  using  up  its  allocated  cycles.  Such  early  completion 
often  results  in  CPU  idle  time,  thereby  wasting  energy.  To 
save  this  energy,  we  need  to  dynamically  adjust  the  CPU 
speed. 

In  general,  there  are  two  dynamic  speed  scaling  ap¬ 
proaches,  namely  the  conservative  approach  and  the  ag¬ 
gressive  approach.  The  conservative  approach  assumes 
that  a  job  will  use  its  allocated  cycles,  and  starts  a  job 
with  at  above  static  optimal  speed  and  then  decelerates 
when  the  job  completes  early.  On  the  other  hand,  the 
aggressive  approach  assumes  that  a  job  will  use  fewer  cy¬ 
cles  than  allocated,  and  starts  a  job  at  a  lower  speed  and 
then  accelerates  as  the  job  progresses.  The  aggressive  ap¬ 
proach  is  adopted  in  [38]  because  it  saves  more  energy  for 
jobs  that  complete  early,  and  most  jobs  in  its  studied  ap¬ 
plication  use  fewer  cycles  than  allocated.  Similar  results 
are  also  shown  in  [5]  and  [31]. 

We  consider  the  energy  consumed  by  the  system  in¬ 
stead  of  that  by  just  the  processor  and  seek  to  maximize 
energy  efficiency  UER.  Equation  1  indicates  that  there  is 
an  optimal  value  (not  necessarily  the  lower  one)  for  clock 
frequency  that  minimizes  Ei  for  a  task  T. 
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We  assume  that  the  processor  can  be  operated  at  m 
frequencies  {/i,/2,-"  ,/m|/i  <  •••  <  /m}-  RelJA  first 
decides  the  optimal  frequency  for  each  task  Ti  that  maxi¬ 
mizes  the  task’s  local  UER.  At  each  scheduling  event,  for 
all  the  n'  jobs  { Ji,  J2,  •  •  •  ,  Jn'}  currently  in  the  schedul¬ 
ing  queue,  ReUA  sorts  them  based  on  their  UER  under 
the  highest  frequency  /m,  in  a  non-increasing  order.  The 
algorithm  then  inserts  the  jobs  into  a  tentative  schedule 
in  the  order  of  their  critical  times  (earliest  critical  time 
first),  while  respecting  their  resource  dependencies. 

We  define  the  system  load  (Load)  as 
1  "  n. 

Load  =  (4) 

and  define  the  critical  time-based  load  (Cload)  as 

Cload  =  (5) 

For  downward  step  TUFs,  Cload  =  Load. 

If  the  system  is  overloaded,  it  is  possible  that  the  queue 
{ Ji,  J2,  •  •  •  ,  J„'},  whose  queue  load  (Qload)  is  defined  as 
7^  X]fc=i(C'j),/./feW),  is  also  overloaded.  Note  that  Jk-X 
refers  to  the  termination  time  of  Jk-  Thus,  upon  inserting 
a  job,  ReUA  performs  feasibility  check  and  ensures  the 
feasibility  of  the  tentative  schedule;  that  is,  the  predicted 
completion  time  of  each  job  in  the  tentative  schedule  never 
exceeds  its  termination  time. 

To  calculate  a  CPU  frequency  for  the  currently  selected 
job  i.e.,  the  one  at  the  head  of  the  tentative  schedule,  we 
adopt  a  stochastic  DVS  technique  similar  to  the  Look- 
Ahead  EDF  (LaEDF)  technique  discussed  in  [31].  The 
calculated  value  is  compared  with  the  job’s  local  optimal 
frequency,  and  the  higher  one  is  selected  as  the  CPU  fre¬ 
quency.  This  process  is  elaborated  in  Section  3.4. 

Intuitively,  during  overloads  it  is  very  possible  for  the 
DVS  technique  to  select  the  highest  frequency  fm  for  the 
execution  of  the  processor,  since  the  aggregate  CPU  de¬ 
mand  defined  in  Equation  3  is  higher  than  fm-  Therefore, 
during  overloads,  with  the  constant  energy  consumption 
at  frequency  fm,  to  maximize  the  collective  utility  per 
unit  energy  as  our  objective,  we  need  to  maximize  the 


collective  utility.  This  is  exactly  why  we  sort  the  jobs 
based  on  their  UERs  and  perform  the  feasibility  check. 
Such  heuristics  are  explained  in  detail  in  the  next  section. 

3.4  Procedural  Description 

3.4.1  Overview 

The  scheduling  events  of  ReUA  include  the  arrival  and 
completion  of  a  job,  a  resource  request,  a  resource  release, 
and  the  expiration  of  a  time  constraint  such  as  the  arrival 
of  the  termination  time  of  a  TUF.  To  describe  ReUA,  we 
define  the  following  variables  and  auxiliary  functions: 

•  T  is  the  task  set.  Df  is  task  Tfs  current  invocation’s 
absolute  critical  time;  Cf  denotes  its  remaining  compu¬ 
tation  cycles  for  the  current  job. 

•  fJr  is  the  current  unscheduled  job  set;  cr  is  the  ordered 
schedule.  A  £  7).  is  a  job;  Jk.Dep  is  its  dependency  list. 

•  Jk-D  is  job  Jfe’s  critical  time;  Jk.X  is  its  termination 
time;  Jk-C  is  its  remaining  cycle. 

•  T{Jk)  returns  the  corresponding  task  of  job  Jk-  Thus, 
if  i  =  T{Jk),  then  Jk-C  =  Cf ,  and  Jk.D  =  Df . 

•  Function  Owner  (R)  denotes  the  jobs  that  are  currently 
holding  resource  R\  reqRes(T)  returns  the  resource  re¬ 
quested  by  T. 

•  headOf  (a)  returns  the  first  job  in  a;  sortByUERCo-) 
sorts  a  by  each  job’s  UER.  selectFreqCr)  returns  the 
lowest  frequency  fi  €  {/i,/2,---  ,/m|/i  <  •••  <  fm}, 
such  that  X  <  fi. 

•  Insert (r,(T,/)  inserts  T  in  the  ordered  list  a  at  the 
position  indicated  by  index  / ;  if  there  are  already  entries 
in  (7  at  the  index  7,  T  is  inserted  before  them.  After 
insertion,  the  index  of  T  in  cr  is  I. 

•  Remove (r,(T, 7)  removes  T  from  ordered  list  0  at  the 
position  indicated  by  index  7;  if  T  is  not  present  at  the 
position  7  in  cr,  the  function  takes  no  action. 

•  lookup (TjCr)  returns  the  index  value  associated  with 
the  first  occurrence  of  T  in  the  ordered  list  cr. 

•  feasible  (cr)  returns  a  boolean  value  indicating  sched¬ 
ule  cr’s  feasibility.  For  a  schedule  cr  to  be  feasible,  the 
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predicted  completion  time  of  each  job  in  a  must  never 
exceed  its  termination  time.  The  predicated  completion 
time  is  calculated  under  the  highest  frequency  fm- 
A  description  of  RelJA  at  a  high  level  of  abstraction  is 
shown  in  Algorithm  1.  In  line  3  of  Algorithm  1,  the  pro¬ 
cedure  of f lineComputingO ,  as  shown  in  Algorithm  2, 
calculates  Di  and  Ci  for  each  task.  The  procedure  also 
computes  the  optimal  frequency  fx-  for  each  task  T;  that 
maximizes  the  task  LoUER.  LoUER  is  defined  as  Ui{t  -\- 
^)/  {Ci  X  E{f)),  where  E{f)  is  derived  using  Equation  1. 
This  calculation  is  performed  at  f  =  0. 

Algorithm  1:  ReUA:  High  Level  Description 
1;  input  :  T  =  {Ti,  •  •  •  ,T„},  Jr  =  {Ji,  •  •  •  ,  J„/} 

2;  output  :  selected  job  and  frequency  /ea;e 
3;  off lineComputing  (T); 

4;  Initialization:  t  :=  tcur,  n  :=  0; 

5;  switch  triggering  event  do 

6;  case  task_release(Ti)  Cl  =  Cl; 

7:  case  taskLcompletion(Ti )  Cl  =  O’, 

8;  |_  otherwise  Update  Cl; 

9;  for  VJfe  G  Jr  do 
10;  if  ! feasible! Jk)  then 

11;  I  abort  (Jfe); 

else 

12;  |_  Jk-Dep  :=  buildDep(Jfc); 

13;  for  VJfe  e  Jr  do 

14;  |_  Jfe.l7Ei?:=calculateUER( Jfe,  t); 

15;  atmp  :=sortByUER(  J'r)  ; 

16;  for  VJfe  e  atmp  from  head  to  tail  do 

17;  if  Jk-UER  >  0  then 

18;  I  (T  :=  insertByECFCcr,  Jfe) ; 

19;  _  else  break; 

20;  Je2,e:=headDf  ((j) ; 

21;  fexe’-  =decideFreq(T,  Jexe,  t); 

22;  return  Jexe  and  fexe’, 

When  ReUA  is  invoked  at  time  tcur,  the  algorithm  first 
updates  each  task’s  remaining  cycle  (the  switch  starting 
from  line  5).  The  algorithm  then  checks  the  feasibility  of 
the  jobs.  If  the  earliest  predicted  completion  time  of  a  job 
is  later  than  its  termination  time,  it  can  be  safely  aborted 
(line  11).  Otherwise,  ReUA  builds  the  dependency  list  for 
the  job  (line  12). 

The  UER  of  each  job  is  computed  by  procedure  cal- 
culateUERO ,  and  the  jobs  are  then  sorted  by  their  UERs 
(line  14  and  15).  In  each  step  of  the  for  loop  from  line  16 
to  19,  the  job  with  the  largest  UER  and  its  dependencies 


Algorithm  2:  off  lineComputingO 
1;  iuput:  Task  set  T;  output:  Dt,  Ci,  ftf.  ; 

2;  D,  =  U-\oi  X  urn-, 

3;  a  =  E{Y,)  +  ^£z^^^; 

4;  Decide  /)!,,  such  that  Ui{-^)/  {Ci  x  E(/|.J)  = 
max(f/i(^)/  {Ci  X  E(/i))),  Vi  G  {1,  2,  •  •  •  ,  m}; 

are  inserted  into  a,  if  it  can  produce  a  positive  UER.  The 
output  schedule  a  is  then  sorted  by  the  jobs’  critical  times 
by  the  procedure  insertByECFO . 

Einally,  ReUA  analyzes  the  demands  of  the  task  set  and 
applies  DVS  to  decide  the  CPU  frequency  /exe  (line  21). 

The  selected  job  Jexe,  which  is  at  the  head  of  a,  is  exe¬ 
cuted  at  the  frequency  fcxs  (line  20-22). 

3.4.2  Resource  and  Deadlock  Handling 

Before  ReUA  can  compute  job  partial  schedules,  the 
dependency  chain  of  each  job  must  be  determined.  Algo¬ 
rithm  3  shows  this  procedure. 

Algorithm  3:  buildDepO 

1;  iuput:  Job  Jfe;  output:  Jk-Dep  ; 

2;  Initialization  :  Jk-Dep  :=  Jk',  Prev  ’■=  Jk', 

3;  while  reqRes(Prev)  yf  0  A 
Owner  (  reqRes  ( Prev)  )  A®  do 
4;  I  Jk-Dep  Ow’n.ert.reqRes  ( Prev)  )  -Jk-Dep; 

5;  ^  Prev  :=  Owner  ( regRes  fPrevi  ); 

Algorithm  3  follows  the  chain  of  resource  request/ownership. 
For  convenience,  the  input  job  Jk  is  also  included  in  its 
own  dependency  list.  Each  job  J;  other  than  Jfe  in  the 
dependency  list  has  a  successor  job  that  needs  a  resource 
which  is  currently  held  by  J;.  Algorithm  3  stops  either 
because  a  predecessor  job  does  not  need  any  resource  or 
the  requested  resource  is  free.  Note  that  denotes  an 
append  operation.  Thus,  the  dependency  list  starts  with 
Jfe  ’s  farthest  predecessor  and  ends  with  Jfe. 

To  handle  deadlocks,  we  consider  a  deadlock  detec¬ 
tion  and  resolution  strategy,  instead  of  a  deadlock  pre¬ 
vention  or  avoidance  strategy.  Our  rationale  for  this  is 
that  deadlock  prevention  or  avoidance  strategies  normally 
pose  extra  requirements — e.g.,  resources  must  always  be 
requested  in  ascending  order  of  their  identifiers. 
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Further,  restricted  resource  access  operations  that  can 
prevent  or  avoid  deadlocks,  as  done  in  many  resource  ac¬ 
cess  protocols,  are  not  appropriate  for  the  class  of  em¬ 
bedded  real-time  systems  that  we  focus  on.  For  example, 
the  Priority  Ceiling  protocol  [32]  assumes  that  the  highest 
priority  of  jobs  accessing  a  resource  is  known.  Likewise, 
the  Stack  Resource  policy  [6]  assumes  preemptive  “levels” 
of  threads  a  priori.  Such  assumptions  are  too  restrictive 
for  the  class  of  systems  that  we  focus  on  (due  to  their 
dynamic  nature). 

Recall  that  we  are  assuming  a  single-unit  resource  re¬ 
quest  model.  For  such  a  model,  the  presence  of  a  cycle 
in  the  resource  graph  is  the  necessary  and  sufficient  con¬ 
dition  for  a  deadlock  to  occur.  Thus,  the  complexity  of 
detecting  a  deadlock  can  be  mitigated  by  a  straightfor¬ 
ward  cycle-detection  algorithm. 


Algorithm  4:  Deadlock  Detection  and  Resolution 


1; 

2: 

3: 

4; 

5; 

6; 

7; 

8; 

9; 


input:  Requesting  job  Jk,  tcur', 

/*  deadlock  detection  */; 

Deadlock  :=  false; 

Ji  :=  OvnerireqResCJk))', 

while  Ji  do 

Ji.LoUER  :=  Uj,  {tcnr  +  ^)/{Ji.C  X  E{U)y, 

if  Ji  =  Jk  then 

Deadlock  :=  true; 
break; 


else 

|_  Ji  :=  Owner  (reqRes  ( Ji)  ); 


/*  deadlock  resolution  if  amy  */; 

10;  if  Deadlock  =  true  then 

11;  I  abort  (r/ie  job  Jm  with  the  minimal  LoUER 
|_  in  the  cycle)-, 


The  deadlock  detection  and  resolution  algorithm  (Al¬ 
gorithm  4)  is  invoked  by  the  scheduler  whenever  a  job 
requests  a  resource.  Initially,  there  is  no  deadlock  in  the 
system.  By  induction,  it  can  be  shown  that  a  deadlock 
can  occur  if  and  only  if  the  edge  that  arises  in  the  re¬ 
source  graph  due  to  the  new  resource  request  lies  on  a 
cycle.  Thus,  it  is  sufficient  to  check  if  the  new  edge  re¬ 
sulting  from  the  job’s  resource  request  produces  a  cycle 
in  the  resource  graph. 

To  resolve  the  deadlock,  some  job  needs  to  be  aborted. 
If  a  job  Ji  were  to  be  aborted,  then  its  timeliness  utility 
is  lost,  but  energy  is  still  consumed.  To  minimize  such 


loss,  we  compute  the  LoUER  of  each  job  at  t^ur  at  the 
frequency  fm-  ReUA  aborts  the  job  with  the  minimal 
LoUER  in  the  cycle  to  resolve  a  deadlock. 

3.4.3  Manipulating  Partial  Schedules 


The  calculateUERO  algorithm  (Algorithm  5)  accepts 
a  job  Jk  (with  its  dependency  list)  and  the  current  time 
fcttr-  On  completion,  the  algorithm  determines  UER  for 
Jk,  by  assuming  that  jobs  in  Jk.Dep  are  executed  from 
the  current  position  (at  time  tcur)  in  the  schedule,  while 
following  the  dependencies. 


Algorithm  5:  calculateUERO 


1;  input:  Jk,  tcur’,  output:  Jk.UER-, 

2;  Initialization  :  Cc  '■=  0,  E  :=  0,  U  :=  0; 
3;  for  VJ;  €  Jk.Dep,  from  head  to  tail  do 

4;  Cc-.=  C,  +  Jl.C-, 

5;  [  U  :=  U  -h  Uj,  it. 

6;  E  :=  EiU)  X  CV, 

7;  Jk.UER  :=  U/E-, 

8;  return  Jk.UER-, 


-L  \ 
cur  ]  j-  J 


To  compute  Jfc’s  UER  at  time  tcur,  ReUA  considers 
each  job  J;  that  is  in  jps  dependency  chain,  which  needs 
to  be  completed  before  executing  Jk.  The  total  compu¬ 
tation  cycles  that  will  be  executed  upon  completing  Jk  is 
counted  using  the  variable  Cc  of  line  4.  With  the  known 
expected  computation  cycles  of  each  task,  we  can  derive 
the  expected  completion  time  and  expected  energy  con¬ 
sumption  under  fm  for  each  task,  and  thus  get  their  ac¬ 
crued  utility  to  calculate  UER  for  Jk. 

Thus,  the  total  execution  time  (under  fm)  of  the  job 
Jk  and  its  dependents  consists  of  two  parts:  (1)  the  time 
needed  to  execute  the  jobs  holding  the  resources  that  are 
needed  to  execute  A;  and  (2)  the  remaining  execution 
time  of  Jk  itself.  According  to  the  process  of  buildDepO , 
all  the  relative  jobs  are  included  in  Jk.Dep. 

Note  that  we  are  calculating  each  job’s  UER  assuming 
that  the  jobs  are  executed  at  the  current  position  in  the 
schedule.  This  would  not  be  true  in  the  output  sched¬ 
ule  a,  and  thus  affects  the  accuracy  of  UERs  calculated. 
But  with  the  non- increasing  shape  of  each  job’s  TUF, 
we  are  calculating  the  highest  possible  UER  of  each  job 
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by  assuming  that  it  is  executed  at  the  current  position. 
Intuitively,  this  would  benefit  the  final  UER,  since  in- 
sertByECFO  always  takes  the  job  with  the  highest  UER 
at  each  insertion  on  cr.  Also,  the  UER  calculated  for  the 
scheduled  job,  which  is  at  the  head  of  the  feasible  sched¬ 
ule,  is  always  accurate. 

The  details  of  insertByECFO  in  line  18  of  Algorithm  1 
are  shown  in  Algorithm  6.  insertByECFO  updates  the 
tentative  schedule  a  by  attempting  to  insert  each  job, 
along  with  all  of  its  dependencies,  to  a.  The  updated 
schedule  a  is  an  ordered  list  of  jobs,  where  each  job  is 
placed  according  to  the  critical  time  it  should  meet. 


Algorithm  6:  insertByECFO 


1 

2 

3 

4 

5 

6 

7 

8 
9 

10 

11 


input  :  Jfe  and  an  ordered  job  list  a 
output  :  the  updated  list  a 
if  Jfc  ^  cr  then 

copy  cr  into  atent-  (7tent  :=cr; 

Insert  (Jfe,  atent,  Jk-D); 

CuCT  —  Jk-D'^ 

for  VJj  €  {Jk-Dep  —  Jk}  from  tail  to  head  do 
if  Ji  ^  o'tent  then 

C'T=lookup( Jj,  atent); 

if  CT  <  CuCT  then  continue; 

|_  else  Remove (Ji,  atent,  CT); 


12: 

13: 


CuCT —mini CuCT,  Ji.D); 
Insert (J;,  atent,  CuCT); 


14: 

15: 


if  feasible(atent)  then 
\_  a  atent; 


16;  return  cr; 


Note  that  the  time  constraint  that  a  job  should  meet 
is  not  necessarily  the  job  critical  time.  In  fact,  the  index 
value  of  each  job  in  a  is  the  actual  time  constraint  that 
the  job  must  meet. 

A  job  may  need  to  meet  an  earlier  critical  time  in  order 
to  enable  another  job  to  meet  its  time  constraint.  When¬ 
ever  a  job  is  considered  for  insertion  in  cr,  it  is  scheduled 
to  meet  its  own  critical  time.  However,  all  of  the  jobs 
in  its  dependency  list  must  execute  before  it  can  execute, 
and  therefore,  must  precede  it  in  the  schedule.  The  index 
values  of  the  dependencies  can  be  changed  with  Insert!) 
in  line  13  of  Algorithm  6. 

The  variable  CuCT  is  used  to  keep  track  of  this  infor¬ 
mation.  Initially,  it  is  set  to  be  the  critical  time  of  job  Jk, 
which  is  tentatively  added  to  the  schedule  (line  6,  Algo¬ 


rithm  6).  Thereafter,  any  job  in  Jk-Dep  with  a  later  time 
constraint  than  CuCT  is  required  to  meet  CuCT.  If,  how¬ 
ever,  a  job  has  a  tighter  critical  time  than  CuCT,  then  it 
is  scheduled  to  meet  the  tighter  critical  time,  and  CuCT 
is  advanced  to  that  time  since  all  jobs  left  in  Jk-Dep  must 
complete  by  then  (lines  12-13,  Algorithm  6).  Finally,  if 
this  insertion  produces  a  feasible  schedule,  then  the  jobs 
are  included  in  the  schedule;  otherwise,  not  (lines  14-15). 

It  is  worth  noting  that  insertByECFO  sorts  jobs  in  the 
non- decreasing  critical  time  order  if  possible,  but  its  sub¬ 
procedure  feasible  0  checks  the  feasibility  of  atent  based 
on  each  job’s  termination  time.  This  is  because  a  jobs’ 
critical  time  is  smaller  or  equal  to  its  termination  time. 
So  even  if  a  job  cannot  complete  before  its  critical  time,  it 
may  still  accrue  some  utility,  as  long  as  it  finishes  before 
its  termination  time.  Thus,  we  need  to  prevent  “over¬ 
killing”  in  feasible!).  The  effectiveness  of  such  preven¬ 
tion  is  further  illustrated  in  Section  5.3. 

3.4.4  Deciding  the  Processor  Frequency 


ReUA  adopts  a  stochastic  DVS  technique  similar  to 
LaEDF  [31],  as  shown  in  Algorithm  7. 


Algorithm  7:  DecideFreq!) 

1:  input:  T,  Jexe,  teur',  OUtput:  fexe  ; 

2;  Initialization  :  T.Dep  '-=  T;  PrevT  ~Ti; 

3:  Util  :=  Cij Di  -b  •  •  •  4-  Cn/Dn', 

4;  s  :=  0; 

5-.  for  i  =  1  to  n,  T  e  {Ti,  ■■■  ,Tn\Di  >  -  -  -  >  Dn} 

do 


6 

7 

8 

9; 

10; 


/*  reverse  EDF  order  of  tasks  */; 

Util  :=  Util  —  CijDi, 

X  :=max!0,  Cf  -  (fm  -  Util)  X  (D“  -  D^)); 

1,  if  Df  -  =  0 


Util  := 


U til  -j- 


otherwise 


s  :=  s  -b  a:; 


11:  /  :=min(/m,  s/ [D^  -  tcur))] 
12:  fexe-.  =seiectFreq  (/); 

fexe'-~-^^^i-fexe,  fT(Jexe)'^'^ 


ReUA  keeps  track  of  the  remaining  computation  cycles 
Ci ,  as  updated  from  line  5  to  line  8  of  Algorithm  1.  Unlike 
LaEDF,  ReUA  uses  the  aggregate  CPU  demand  shown 
in  Equation  3  during  the  process  of  DVS.  From  line  3 
to  line  11,  the  algorithm  considers  the  interval  until  the 
next  task  critical  time  and  tries  to  “push”  as  much  work  as 
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possible  beyond  the  critical  time.  The  algorithm  considers 
the  tasks  in  the  latest-critical-time-first  order  in  line  5. 

X  is  the  minimum  number  of  cycles  that  the  task  must 
execute  before  the  closest  critical  time,  DJJ,  in  order  for 
it  to  complete  by  its  own  critical  time  (line  8),  assum¬ 
ing  worst-case  aggregate  CPU  demand  U til  by  tasks  with 
earlier  critical  times.  The  aggregate  demand  Util  is  ad¬ 
justed  to  reflect  the  actual  demand  of  the  task  for  the 
time  after  (line  9).  s  is  simply  the  sum  of  the  x  values 
calculated  for  all  of  the  tasks,  and  therefore  reflects  the 
minimum  number  of  cycles  that  must  be  executed  by  D!^ 
in  order  for  all  tasks  to  meet  their  critical  times  (line  10). 
In  line  11,  the  operating  CPU  frequency  is  set  just  fast 
enough  to  execute  s  cycles  over  this  interval. 

Thus,  decideFreqO  capitalizes  on  early  task  comple¬ 
tion  by  deferring  work  for  future  tasks  in  favor  of  scaling 
the  current  task.  In  addition,  in  line  9,  we  consider  the 
case  that  jobs  of  different  tasks  have  the  same  absolute 
critical  times,  which  sometimes  occurs,  especially  during 
overloads.  Also,  it  is  possible  that  during  overloads,  the 
required  frequency  may  be  higher  than  fm  and  select- 
FreqO  would  fail  to  return  a  value.  In  line  11,  we  solve 
this  by  setting  the  upper  limit  of  the  required  frequency 
to  be  fm- 

Finally,  in  line  13,  the  result  of  selectFreqO  is  com¬ 
pared  with  T{Jsxe)’s  optimal  frequency  decided  in  of- 
f lineComputingO .  The  higher  frequency  is  selected  to 
preserve  the  statistical  performance  guarantee  and  maxi¬ 
mize  system-level  UER. 

4.  PROPERTIES  OF  REUA 
4.1  Non-Timeliness  Properties 

We  now  discuss  ReUA’s  non-timeliness  properties  in¬ 
cluding  deadlock-freedom,  correctness,  and  mutual  exclu¬ 
sion. 

ReUA  respects  resource  dependencies  by  ensuring  that 
the  job  selected  for  execution  can  execute  immediately. 
Thus,  no  job  is  ever  selected  for  normal  execution  if  it  is 


resource-dependent  on  some  other  job. 

Theorem  1.  ReUA  ensures  deadlock- freedom. 

Proof  A  cycle  in  the  resource  graph  is  the  sufficient 
and  necessary  condition  for  a  deadlock  in  the  single-unit 
resource  request  model.  ReUA  does  not  allow  such  a  cycle 
by  deadlock  detection  and  resolution;  so  it  is  deadlock 
free.  □ 

Lemma  2.  In  insertByECFO’s  output,  all  the  depen¬ 
dents  of  a  job  must  execute  before  it  can  execute,  and 
therefore,  must  precede  it  in  the  schedule. 

Proof  insertByECF  ()  seeks  to  maintain  an  output  queue 
ordered  by  jobs’  critical  times,  while  respecting  resource 
dependencies.  Consider  job  Jk  and  its  dependent  J;.  If 
Ji.D  is  earlier  than  Jk-D,  then  J;  will  be  inserted  before 
Jk  in  the  schedule.  If  Ji.D  is  later  than  Jk-D,  Ji.D  is 
advanced  to  be  Jk-D  with  the  operation  with  CuCT.  Ac¬ 
cording  to  the  definition  of  insert  () ,  after  advancing  the 
critical  time,  Jj  will  be  inserted  before  Jk-  n 

Theorem  3.  When  a  job  Jk  that  requests  a  resource  R 
is  selected  for  execution  by  ReUA,  Jk  ’s  requested  resource 
R  will  be  free.  We  call  this  ReUA’s  correctness  property. 

Proof  From  Lemma  2,  the  output  schedule  o  is  correct. 
Thus,  ReUA  is  correct.  □ 

Thus,  if  a  resource  is  not  available  for  a  job  JUs  request, 
jobs  holding  the  resource  will  become  Jfe’s  predecessors. 
We  present  ReUA’s  mutual  exclusion  property  by  a  corol¬ 
lary. 

Corollary  4.  ReUA  satisfies  mutual  exclusion  con¬ 
straints  in  resource  operations. 

4.2  Timeliness  Properties 

We  consider  timeliness  properties  under  no  resource  de¬ 
pendencies,  where  ReUA  can  be  compared  with  a  num¬ 
ber  of  well-known  algorithms.  Specifically,  we  consider 
the  following  two  conditions:  (1)  a  set  of  independent  pe¬ 
riodic  tasks,  where  each  task  has  a  single  computational 
thread  with  a  downward  step  TUF  (such  as  the  one  shown 
in  Figure  1(d));  and  (2)  there  are  sufficient  processor  cy¬ 
cles  for  meeting  all  task  termination  times — i.e.,  there  is 
no  overload. 
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Theorem  5.  Under  conditions  (1)  and  (2),  a  schedule 
produeed  by  EDF  [17]  is  also  produced  by  ReUA,  yielding 
equal  total  utilities.  Not  coincidentally,  this  is  simply  a 
termination-time  ordered  schedule. 

Proof  We  prove  this  by  examining  Algorithms  1  and  6. 
For  a  job  J  without  dependencies,  J.Dep  only  contains  J 
itself.  For  periodic  tasks  with  step  TUFs,  a  task’s  crit¬ 
ical  time  is  the  same  as  its  termination  time.  During 
non-overload  situations,  a  from  line  18  of  Algorithm  1  is 
termination-time  ordered. 

The  TUF  termination  time  that  we  consider  is  analo¬ 
gous  to  a  deadline  in  [17].  As  proved  in  [17,23],  a  deadline- 
ordered  schedule  is  optimal  (with  respect  to  meeting  all 
deadlines)  when  there  are  no  overloads.  Thus,  cr  yields 
the  same  total  utility  as  EDF.  □ 

Some  important  corollaries  about  ReUA’s  timeliness 
behavior  during  non-overload  situations  can  be  deduced 
from  EDF’s  optimality  [11]. 

Corollary  6.  Under  conditions  (1)  and  (2),  ReUA 
always  meets  all  task  termination- times. 

Corollary  7.  Under  conditions  (1)  and  (2),  ReUA 
yields  the  minimum  possible  maximum  lateness. 

ReUA  also  provides  statistical  performance  guarantees 
under  possible  conditions.  With  condition  (1),  the  utility 
requirement  of  a  task  can  only  take  =  0  or  =  1. 
From  Corollary  6,  we  can  derive  the  properties  of  ReUA 
on  performance  guarantees. 

Theorem  8.  Under  conditions  (1)  and  (2),  ReUA  meets 
all  statistical  performance  requirements. 


Theorem  9.  For  a  set  of  independent  periodie  tasks, 
where  each  task  has  a  single  computational  thread  with  a 
non-inereasing  TUF,  Cload  <=  1  is  the  sufficient  condi¬ 
tion  for  Re  UA  to  meet  all  statistical  performance  require¬ 
ments. 

Proof  With  Ui  and  pi  of  task  Ti,  ReUA  converts  the 
performance  guarantee  problem  to  the  problem  of  meeting 
critical  times.  If  Cload  <=  1,  according  to  the  result  of 
Theorem  8,  the  assertion  holds.  □ 

Note  that  Theorem  9  only  states  that  Cload  <=  1  is 
the  sufficient  condition.  Actually,  it  is  not  the  necessary 
condition.  We  illustrate  this  with  an  example  in  Section  5. 

5.  EXPERIMENTAL  RESULTS 

In  order  to  experimentally  evaluate  the  performance 
of  ReUA,  we  developed  a  simulator  for  the  operation  of 
hardware  capable  of  DVS,  and  performed  extensive  simu¬ 
lations.  We  first  present  the  simulation  methodology,  and 
then  discuss  the  results. 

5.1  Simulation  Methodology 

Our  simulator  is  written  with  the  simulation  tool  OM- 
NET-b- 1-  [34],  which  provides  a  discrete  event  simulation 
environment.  The  simulator  takes  as  input  a  task  set, 
specified  with  the  period  or  minimum  inter-arrival  time 
(abbreviated  as  P/I. A.),  and  real-time  requirements.  The 
tasks’  time  constraints  i.e.,  TUFs  and  the  means/ variances 
of  the  cycle  demands  are  also  specified  as  the  input.  The 
tasks  contained  in  a  task  set  G  are  selected  from  Table  1. 


Proof  From  Corollary  6,  under  conditions  (1)  and  (2), 


The  table  also  summarizes  these  tasks’  input  parameters. 


ReUA  can  meet  all  task  termination-times.  This  ensures 


Table  1:  Experimental  Tasks 


that  Vi  =  1  can  be  satisfied  for  each  task.  Based  on  the 
results  of  Equation  2,  at  least  pi  demanded  processor  cy¬ 
cles  of  task  Ti  are  less  than  the  allocated  cycles.  From 
Corollary  6,  all  the  allocated  cycles  can  be  completed  be¬ 
fore  their  termination  times.  Thus,  for  task  Ti,  ReUA  can 
meet  at  least  pi  termination  times;  i.e.,  ReUA  accrues  Vi 
utility  with  a  probability  at  least  pi.  □ 

From  Theorem  8,  we  can  derive  its  counterpart  for  non¬ 
increasing  TUFs  with  the  definitions  of  Equations  4  and  5. 


Task 

Jobs 

P/I.A. 

TUF 

Ti 

130 

21 

step,  height  —  10 

T2 

124 

22 

step,  height  —  80 

Ta 

137 

20 

step,  height  —  10 

Ti 

109 

25 

step,  height  —  80 

n 

130 

21 

f  -0.025t’^  -b  10,  0  <  t  <  20 

1  0,  otherwise 

Te 

124 

22 

f  -4x  +  80,  0  <  t  <  20 

1  0,  otherwise 

Tt 

137 

25 

f  -O.Olx’^  -  0.15a:  +  10,  0  <  i  <  25 

1  0,  otherwise 

Ta 

124 

21 

1  -0.5a:  +  10,  0  <  i  <  20 

1  0,  otherwise 

Tg 

124 

20 

the  same  as  Tg’s 

Tio 

124 

25 

the  same  as  Tg’s 
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Figure  2:  Normalized  UER  vs.  Load  with  Step  TUFs  under  Various  Energy  Model  Settings 


We  change  the  tasks’  cycle  demands  to  change  the  sys¬ 
tem  load  {Load)  as  defined  in  Equation  4.  For  each  de¬ 
mand  Yi,  we  keep  Var{Yi)  «  E{Yi),  and  generate  normally- 
distributed  cycle  demands. 

The  energy  consumption  per  cycle  at  a  particular  fre¬ 
quency  is  calculated  using  Equation  1.  In  practice,  the  Ss, 
S2,  S\,  and  So  terms  depend  on  the  power  management 
state  of  the  system  and  its  subsystems.  For  example,  if 
a  laptop  has  its  display  on,  the  S'o  term  will  be  large  rel¬ 
ative  to  the  others.  But  if  the  display  has  been  turned 
off,  the  S'o  term  will  be  much  smaller.  Different  types  of 
systems  will  also  have  different  relative  values  for  the  S 
terms.  The  So  term  is  probably  a  much  larger  fraction  of 
the  total  power  in  a  PDA  than  it  is  in  a  laptop  [26,27,36]. 

We  use  experimental  settings  that  are  similar  to  that  in 
Martin’s  PhD  thesis  [26],  but  de-normalize  the  terms.  For 
comparison,  the  experiments  are  carried  out  under  three 
energy  model  settings,  as  shown  in  Table  2.  Note  that  Ei 
is  the  same  as  the  traditional  energy  model,  which  only 
considers  the  energy  consumed  by  the  processor. 


Table  2:  Energy  Model  Settings 


Energy  Model 

S3 

62 

6'i 

So 

El 

1.0 

0 

0 

0 

E2 

0.75 

0 

0 

0.25/]} 

E3 

0.5 

0 

0 

0.5/;} 

Other  parameters  that  are  supplied  to  the  simulator  in¬ 
clude  the  processor  specification.  We  consider  a  processor 
that  supports  seven  different  frequencies  including  {360, 
550,  640,  730,  820,  910,  1000  MHz}.  These  frequencies 
reflect  the  setting  that  is  available  on  a  platform  incor¬ 


porating  an  AMD  k6  processor  with  AMD’s  PowerNow! 
mechanism  [3[. 

In  addition  to  ReUA,  we  implemented  the  following 
schemes  for  comparison:  BaseEDF,  LaEDF,  StaticEDF, 
and  LaEDF-NA. 

BaseEDF  is  the  EDF  scheduler  without  any  DVS  sup¬ 
port  and  uses  the  highest  frequency.  LaEDF  is  the  Look¬ 
ahead  RT-DVS  for  EDF  scheduler  in  [31] .  StaticEDF  uses 
the  constant  speed  given  by  Equation  3  and  a  “ceiling”  up 
to  the  lowest  suitable  frequency  in  {/i,  /2,  •  •  •  ,  /m}.  Stat¬ 
icEDF  switches  to  the  lowest  frequency  whenever  there 
is  no  ready  task.  Combining  the  static  schemes  in  [5] 
and  [31],  StaticEDF  is  the  static  optimal  solution  to  the 
DVS  problem  for  the  periodic  task  model  with  step  TUFs 
under  the  available  frequency  set.  The  previous  three 
schemes  abort  infeasible  tasks  during  overloads.  Thus, 
LaEDF-NA  is  LaEDF  with  no  abortion. 

LaEDF,  LaEDF-NA,  and  StaticEDF  perform  DVS  on 
periodic  tasks  with  known  worst-case  workload,  which  is 
unavailable  in  our  application  model.  Thus,  we  use  the 
minimum  inter-arrival  time  and  cycles  allocated  by  ReUA 
as  their  inputs. 

5.2  Impact  of  Energy  Models 

In  our  first  set  of  simulation  experiments,  we  determine 
the  effects  of  our  new  energy  model.  We  consider  the  task 
set  Gi  =  (Ti, r2, Ts, 74},  and  apply  different  schemes  on 
Gi  under  different  energy  settings.  We  consider  down¬ 
ward  step  TUFs,  since  all  the  other  algorithms  compared 
can  only  deal  with  deadlines.  Each  task  Ti  has  the  statis- 
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(a)  Normalized  AUR  vs.  Load 
Figure  3:  Normalized  Energy  and  AUR  vs. 

tical  performance  requirement  oi  Vi  =  1  and  pi  =  0.96. 

Figure  2  shows  the  UER  for  all  the  DVS  schemes  nor¬ 
malized  to  the  BaseEDF  under  energy  model  settings  Ei , 
E2,  and  E3,  as  Load  varies  from  0.2  to  1.8.  We  observe 
that  under  all  three  energy  settings,  ReUA  performs  the 
best  among  all  strategies  under  all  loads,  and  especially 
during  overloads.  We  also  observe  that  LaEDF-NA  yields 
almost  zero  UER  during  overloads. 

As  the  figure  shows,  during  overloads,  the  normalized 
UERs  produced  by  LaEDF,  StaticEDF,  and  BaseEDF 
converge  to  1.  This  is  because,  all  three  algorithms  select 
the  highest  frequency  by  DVS  calculation  during  over¬ 
loads,  and  bear  no  difference  in  scheduling.  As  the  term 
So  in  the  energy  model  increases,  ReUA  adjusts  the  se¬ 
lected  frequency  to  accrue  more  UER.  This  effect  is  more 
pronounced  under  E3,  when  LaEDF,  LaEDF-NA,  and 
StaticEDF  perform  worse  than  BaseEDF,  while  ReUA 
still  outperforms  BaseEDF  during  all  loads. 

We  speculate  that,  the  UER  gap  between  ReUA  and  the 
other  schemes  is  because,  during  overloads,  ReUA  saves 
more  energy  and  accrues  higher  utility.  Our  speculation 
is  verihed  in  Figure  3,  which  shows  the  accrued  utility 
and  energy  consumption  normalized  to  BaseEDF,  under 
energy  model  setting  E2. 

From  Figure  3(a),  we  observe  that  during  under-loaded 
situations,  all  schemes  accrue  the  same  (optimal)  util¬ 
ity  because  of  EDF’s  optimality  [11]  during  such  situa¬ 
tions.  But  during  overload  situations,  LaEDF-NA  suffers 
domino  effects  and  accrues  almost  no  utility  [24].  On 


(b)  Normalized  Energy  vs.  Load 
Load  with  Step  TUFs  under  Energy  Setting  E2 

the  other  hand,  ReUA  seeks  to  schedule  jobs  with  higher 
UERs,  and  thus  accrues  remarkably  higher  utility  than 
the  others. 

In  Figure  3(b),  during  under-loads,  we  observe  that 
ReUA  saves  more  energy  than  the  other  schemes.  Fur¬ 
ther,  this  portion  of  the  curves  is  nearly  symmetric  to 
the  corresponding  portion  of  Figure  2(b).  The  energy 
consumption  of  LaEDF-NA  increases  linearly  with  Load, 
because  it  performs  no  abortion  and  executes  every  job 
that  arrives. 

Since  no  strategies  except  ReUA  consider  the  system- 
level  energy  consumption,  we  only  use  the  energy  model 
El  in  our  further  simulation  experiments. 

5.3  Performance  Guarantee 

To  evaluate  the  statistical  performance  guarantees  pro¬ 
vided  by  ReUA,  we  first  consider  the  task  set  Gi  with 
the  performance  requirement  of  {{vi  =  l,pi  =  0.96),  i  = 
I,...  ,4}. 

Figure  4  shows  the  accrued  utility  ratio  (AUR)  and 
critical-time  meet  ratio  (DMR)  of  each  task  under  increas¬ 
ing  Load.  AUR  is  the  ratio  of  accrued  aggregate  utility 
to  the  maximum  possible  utility,  and  DMR  is  the  ratio 
of  the  jobs  meeting  their  critical  times  to  the  total  job 
releases  of  a  task.  For  a  task  with  a  downward  step  TUF, 
its  AUR  and  DMR  are  identical;  so  we  show  them  in  one 
plot. 

As  Figure  4(a)  shows,  with  ReUA  during  under-loads, 
all  tasks  accrue  100%  AUR  and  DMR,  except  task  Ti, 
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Figure  4:  AUR  and  DMR  vs.  Load  of  Gi  under  E\_ 


whose  AUR  and  DMR  is  99.23%  at  Load  =  0.3.  Thus, 
ReUA  delivers  the  statistical  performance  guarantee  of 
being  able  to  accrue  100%  of  task  maximum  utility  with 
a  probability  at  least  96%  for  all  tasks.  This  also  validates 
Theorem  8. 


Figure  5:  AUR  and  DMR  vs.  Cload  of  G2  under  Ei 

Comparing  the  results  during  overloads  in  Figure  4(a) 
and  4(b),  we  observe  that  ReUA  still  achieves  near  100% 
AUR/DMR  of  task  T2  and  Ta,  but  achieves  less  AUR/DMR 
of  Ti  and  T3.  One  the  other  hand,  LaEDF  decreases  the 
AUR/DMR  of  T2  and  T4  more  than  the  other  two.  This  is 
because,  T2  and  Ta  have  TUFs  with  higher  “heights”  and 


thus  higher  utility;  so  ReUA  accrues  more  system-wide 
utility  by  completing  these  tasks  before  their  termination 
times.  Schemes  based  on  EDF  cannot  make  such  schedul¬ 
ing  decisions — T2  and  T4  are  not  favored  by  LaEDF  since 
they  have  longer  critical  times  than  Ti  and  T^..  We  show 
the  comparison  of  utility  accrual  for  various  schemes  in 
Section  5.4. 

Besides  Gi,  we  also  consider  the  task  set  G2  =  {Tb,  T5,  Tq,T-j} 
that  contains  linear-shaped  and  parabolic-shaped  TUFs 
(with  non-increasing  portion)  as  well  as  step  TUFs.  The 
performance  requirements  of  G2  are  {(1^3  =  1.0,  ps  = 

0.80),  (i^5  =  0.55,  p5  =  0.80),  (i^6  =  0.5,  p6  =  0.80),  (i^7  = 

0.55,  p7  =  0.80)}. 

Figure  5  shows  the  AUR  and  DMR  of  each  task  in  G2 
with  Cload  varying  from  0.7  to  2.0.  System  Load  also 
changes  with  Cload,  and  the  corresponding  values  are 
shown  in  Table  3. 


Table  3:  Cload  and  Load 


Cload 

0.7 

0.8 

0.9 

1.0 

1.1 

1.2 

1.3 

Load 

0.44 

0.5 

0.57 

0.6 

0.7 

0.76 

0.83 

Cload 

1.4 

1.5 

1.6 

1.7 

1.8 

1.9 

2.0 

Load 

0.89 

0.95 

1.01 

1.06 

1.13 

1.2 

1.26 

We  consider  task  T7  as  an  example  to  illustrate  how 
ReUA  delivers  statistical  performance  guarantees.  As 
shown  in  Figure  5,  when  Cload  <  1,  task  T7  is  guar¬ 
anteed  to  accrue  at  least  127  =  55%  of  its  maximum  util¬ 
ity  with  a  probability  no  less  than  p-j  =  80%.  For  ex¬ 
ample,  at  Cload  =  1,  ReUA  accrues  AUR=86.97%  and 
DMR=100%,  which  implies  that  it  can  complete  all  the 
demanded  cycles  of  the  task  before  their  critical  times. 
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(a)  Gg:  Step  TUFs 


Figure  6:  Normalized 

Furthermore,  86.97%  of  the  task  maximum  utility  can  be 
accrued  at  a  probability  100% — much  more  than  the  per¬ 
formance  requirements. 

But  Cload  <  1  is  not  the  necessary  condition  for  de¬ 
livering  statistical  performance  guarantees.  For  example, 
at  Cload  —  1.6  and  Load  =  1.02,  task  T7  can  still  accrue 
AUR=71.21%  and  DMR=89.91%.  This  is  because,  for  a 
task  with  a  non-step  and  non-increasing  TUF,  even  if  the 
task  misses  its  critical  time,  the  task  can  complete  before 
its  termination  time  and  accrue  some  amount  of  utility, 
which  depends  on  the  TUF  shape.  Therefore,  these  ex¬ 
periments  validate  Theorem  9. 

Another  major  pattern  that  can  be  observed  from  Fig¬ 
ure  5  is  that,  as  Cload  and  Load  increases,  task  Tg  with 
a  step  TUF  accrues  more  AUR  and  DMR  than  the  other 
tasks  with  non-step  TUFs.  This  is  because,  Tg’s  full  util¬ 
ity  can  be  accrued  as  long  as  it  is  completed  before  its 
termination  time,  while  completing  other  tasks  just  be¬ 
fore  their  termination  times  may  result  in  very  low  util¬ 
ity.  In  addition,  among  tasks  Tg,  Tg,  and  T7  with  non-step 
TUFs,  the  one  with  the  highest  maximum  utility  i.e.,  Tg, 
is  favored  by  ReUA  to  accrue  more  system-wide  utility. 

5.4  Effectiveness  of  Utility  Accrual 

From  experiments  of  the  previous  sections,  we  observe 
that  ReUA  mimics  the  behavior  of  EDF  during  under¬ 
loaded  situations.  During  overloads,  all  schemes  tend  to 
select  fm  as  the  execution  frequency  by  DVS,  and  thus 
have  the  same  energy  consumption.  Thus,  the  higher 
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(b)  G4:  Linear  TUFs 
UER  vs.  PHR  under  Ei 

UER  produced  by  ReUA  than  the  others  is  due  to  the 
fact  that  ReUA  seeks  to  accrue  more  utility  during  such 
situations.  In  this  section,  we  vary  the  TUF  shape  of  each 
task  to  demonstrate  ReUA’s  utility  accrual  capability. 

We  roughly  define  the  ratio  of  the  maximum  and  min¬ 
imum  heights  of  TUFs  in  a  task  set  as  peak  height  ratio 
(or  PHR).  We  consider  two  task  sets  Gg  and  G4  with 
step  TUFs  and  linear  TUFs,  respectively.  Gg  is  the  set 
Gg  =  {Ti, Tg, Tg, Ti},  where  the  heights  of  U2  and  U4  are 
varied  from  10  to  100.  G4  is  the  set  G4  =  {Tg,  Tg,  Tg,  Tio}, 
where  the  crossing  points  of  the  utility-axes  and  f/g  and 
Uio  are  varied  from  10  to  100.  In  addition,  the  intersec¬ 
tions  with  the  Taxes  of  all  TUFs  in  G4  are  maintained  at 
t  =  20.  Thus,  both  Gg  and  G4  have  PHRs  varying  from 
1  to  10. 

Figure  6(a)  shows  the  UERs  for  ReUA  and  LaEDF 
that  are  normalized  to  LaEDF  under  Gg  with  Load  = 
1.5.  During  overloads,  LaEDF,  StaticEDF,  and  BaseEDF 
yield  the  same  performance;  so  we  only  show  LaEDF  here. 
We  observe  that,  at  PHR  =  1,  ReUA  makes  the  same 
scheduling  decisions  as  LaEDF.  But  as  PHR  increases, 
ReUA  obtains  higher  system-level  UER  than  LaEDF. 

Figure  6(b)  shows  the  normalized  UERs  for  ReUA  and 
LaEDF  under  G4  with  Load  =1.5  and  Cload  =  1.85. 
We  observe  similar  trends  as  that  in  Figure  6(a),  but 
with  larger  performance  gap  as  PHR  increases.  The  two 
strategies’  different  scheduling  criteria  result  in  different 
performance  even  at  PHR  =  1. 


Since  not  all  critical  times  can  be  satisfied  during  over- 
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(a)  UER  vs.  Load  of  Gi 

Figure  7:  Normalized  UER  with 

loads,  ReUA  considers  the  UER  of  each  job  and  seeks 
to  schedule  jobs  with  high  UERs  while  maintaining  the 
critical  time  order  of  jobs  at  the  same  time.  But  LaEDF 
simply  schedules  according  to  tasks’  critical  times,  and 
conforms  to  the  critical  time  order.  In  addition,  during 
overloads,  ReUA  tends  to  abort  jobs  with  low  UERs  in 
the  feasibility  check.  This  results  in  higher  system-level 
utility  than  that  obtained  by  LaEDF,  which  always  aborts 
jobs  with  the  largest  critical  time. 

5.5  Results  under  Resource  Dependency 

To  construct  dependent  task  sets,  we  consider  task  sets 
Gi  and  G2  and  have  each  job  randomly  request  and  re¬ 
lease  resources  from  some  available  set  of  resources  during 
the  job’s  life  cycle.  The  resource  request  and  release  times 
are  uniformly  distributed  within  a  job’s  life  cycle. 

We  conducted  experiments  on  the  task  sets,  which  are 
scheduled  by  ReUA  under  no  resources,  three  shared  re¬ 
sources,  and  five  shared  resources.  Figure  7(a)  shows 
UERs  normalized  to  the  case  of  Gi  with  no  resources, 
as  Load  varies  from  0.2  to  1.8.  Figure  7(a)  shows  the 
same  metric  for  G2,  as  Cload  varies  from  0.7  to  2.0. 

From  the  hgures,  we  observe  that  when  Load  or  Cload 
increases,  the  performance  of  ReUA  on  dependent  task 
sets  decreases.  Higher  the  number  of  shared  resources, 
the  more  performance  decrease  can  be  observed.  This  is 
because,  ReUA  respects  resource  dependencies  in  schedul¬ 
ing,  which  in  the  worst-case  may  cause  jobs  to  be  executed 
in  the  reverse  order  of  UERs  or  critical  times.  So  with 


Resource  Dependencies  under  Ei 

dependent  task  sets,  ReUA  cannot  provide  performance 
guarantees  and  suffers  UER  losses,  especially  during  high 
loads. 

However,  at  very  high  Load  or  Cload  and  with  five 
shared  resources,  normalized  UERs  of  ReUA  on  the  inde¬ 
pendent  task  sets  are  just  better  than  those  on  dependent 
task  sets  by  no  more  than  10%.  This  is  because,  ReUA 
aborts  a  task  when  its  expected  completion  time  is  less 
than  its  termination  time.  Thus,  the  job  queue  seen  by 
the  ReUA  scheduler  at  any  scheduling  event  has  a  length 
no  more  than  the  number  of  tasks.  With  our  experimen¬ 
tal  settings,  we  have  only  limited  performance  loss  in  our 
simulation,  but  we  expect  more  performance  drop  with 
larger  task  sets. 

6.  CONCLUSIONS,  FUTURE  WORK 

This  paper  presents  the  design  and  evaluation  of  ReUA, 
a  resource-constrained,  energy-efficient,  utility-accrual  real¬ 
time  scheduling  algorithm  for  mobile  embedded  systems. 
ReUA  considers  application  activities  that  are  subject  to 
TUF  time  constraints,  resource  dependencies,  and  system- 
level  energy  consumption  concerns. 

The  key  underpinning  of  ReUA  is  the  observation  that 
embedded  real-time  applications  usually  exhibit  large  vari¬ 
ations  in  their  actual  cycle  demands.  This  provides  oppor¬ 
tunities  for  providing  statistical,  timeliness  performance 
guarantees,  while  respecting  resource  dependencies,  and 
for  improving  system-level  energy  efficiency.  To  realize 
this,  the  algorithm  statistically  allocates  cycles  to  individ- 
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ual  application  tasks  and  executes  their  allocated  cycles  at 
different  speeds  with  DVS.  RelJA  makes  such  stochastic 
decisions  based  on  the  statistical  properties  of  the  task  de¬ 
mands.  During  overload  situations,  the  algorithm  heuris- 
tically  schedules  tasks  to  maximize  collective  utility  so  as 
to  improve  system-level  energy  efficiency. 

We  establish  several  timeliness  and  non-timeliness  prop¬ 
erties  of  the  algorithm  such  as  timeliness  optimality  dur¬ 
ing  under-loads,  deadlock-freedom,  correctness,  and  mu¬ 
tual  exclusion.  Our  simulation  experiments  illustrate  that 
RelJA  provides  statistical  performance  guarantees  when 
possible  and  improves  system-level  energy  efficiency. 

Several  aspects  of  the  work  are  interesting  directions  for 
further  research.  One  direction  is  to  consider  the  multi¬ 
unit  resource  request  model  [6].  Another  direction  is  to 
allow  aperiodic  tasks  with  unknown  inter-arrival  times. 
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